-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new algebird abstraction: Scan #739
Conversation
*/ | ||
type Aux[-I, S, +O] = Scan[I, O] { type State = S } | ||
|
||
implicit def applicative[I]: Applicative[({ type L[O] = Scan[I, O] })#L] = new ScanApplicative[I] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use Kind projector here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we use that plugin, we could, but I'd rather not add a build dependency for the few cases where we have done this. This is mostly a demonstration anyway, since I expect if you really want a Applicative[Scan[I, ?]]
you will want a cats implementation, but anyway, this shows you how to make one.
import scala.collection.compat._ | ||
import scala.collection.generic.CanBuildFrom | ||
|
||
object Scan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different than Aggregator[I,M[_]:SemigroupK, O]
. I think the motivation here is that you want S
type to be a container so that reduce
on S
is collecting intermediate state. I believe that can be achieved by existing Aggregators by 1) making S a type constructor 2) making sure that S has an instance of SemigroupK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MansurAshraf First: great to be on a PR with you again! It's been a long time :-).
Second: the idea here is that this transforms streams to streams. whereas aggregators transform Sequences to single values. I would look at the apply
method (starting on line 229 of Stream.scala
as of this writing). The idea is that, for each input in the stream, you have one output in the stream. For example, using directFreeScan
from ScanTest
, something like directFreeScan(List('a', 'b', 'c'))
would result in List("a", "ab", "abc")
.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @jeff-stripe , its been a while :-)
I will take a look at the Stream.scala but cursory look at def fromAggregator[A, B, C](aggregator: Aggregator[A, B, C], initState: B): Aux[A, B, C]
gives me the impression that
Scan[A,C] ~= Aggregator[A,S[_],(S[_],C)]
Scan has a new function called presentAndNextState
which return both S state type and final output type C. If you aggregator type was Aggregator[A,S[_],(S[_],C)]
, the present
function in existing aggregator will return the same value i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MansurAshraf In my example, the analogous Aggregator
would just return "abc"
, not List("a", "ab", "abc")
. In other words, Aggregator
is to the reduce
method as Scan
is to the scanLeft
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mansur, to a first approximation, Aggregator[A, _, B]
is Iterator[A] => Option[B]
, MonoidAggregator[A, _, B]
is Iterator[A] => B
and Scan[A, B]
is Iterator[A] => Iterator[B]
(actually with an added law that the number of items in the result is exactly the same.
You could imagine another idea: ExpandingScan[A, B]
that is Iterator[A] => Iterator[B]
but without the law of the iterators being the same size (then we could do things like filter, or concatMap, etc...)
Does that help explain it at all?
Think of the ExpandingScan
as the most general function you can do in a reducer in Hadoop.
Thanks jeff. CI failed:
|
/** | ||
* A Scan that returns the number N for the Nth input (starting from 0) | ||
*/ | ||
val index: Aux[Any, Long, Long] = fromStreamLike(0L)(n => (n, n+1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really cool.
* @return A scan which, when given [a_1, ..., a_n] outputs [c_1, ..., c_n] where | ||
* c_i = aggregator.prepare(a_i) + ... + aggregator.prepare(a_1) + monoidAggregator.monoid.zero | ||
*/ | ||
def fromMonoidAggregatorReverse[A, B, C]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about adding a def reverse
method to MonoidAggregator
} | ||
} | ||
|
||
def joinWithIndex: Aux[I, (State, Long), (O, Long)] = join(Scan.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this show the compositional power so well!
} | ||
} | ||
|
||
def compose[P](scan2: Scan[O, P]): Aux[I, (State, scan2.State), P] = new Scan[I, P] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add comments to the methods: this feeds each output of this into scan2
and join is like this forks each input and feeds it into both this and scan2
, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a bunch of comments. Let me know if you like the wording/framing of them.
Thanks @johnynek for the comments; I'm running out of steam for the day but will pick this up again tomorrow. |
a really cool example ... just thinking aloud, that's not a request to add that. |
Codecov Report
@@ Coverage Diff @@
## develop #739 +/- ##
===========================================
- Coverage 89.29% 89.26% -0.03%
===========================================
Files 114 115 +1
Lines 8955 9011 +56
Branches 330 335 +5
===========================================
+ Hits 7996 8044 +48
- Misses 959 967 +8
Continue to review full report at Codecov.
|
also, there is a (not very good) streaming median estimator where you keep track of the current median, and if the value is greater than that, you move it up by a constant, and if it is less than that, you move it down. Since the true median should have about half above as below, this should be a random walk that biases towards the median (so after a long enough warmup, it shouldn't be too far from the median). |
looks like |
@johnynek I think I've addressed all of your comments so far (but welcome more). Also, the example of z-score was totally one original motivation for this abstraction, from way back when I was working on feature generation. |
Unrelated: when I was running tests locally to reproduce the above issue, I ran into the following unrelated test failure:
Should I just file an issue for that one? Or is this some sort of known numerical-precision kind of thing? |
@jeff-stripe sorry about the flake. I think it is numerical precision, but it does come up periodically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, agree with your suggestion to move the order in joinWithInput
and suggested using Aggregator.append
@johnynek Pushed your suggested changes + new tests (I made them fail first) for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last minor request, and I'll merge
Done. Thanks for the helpful feedback and thorough review! The result is much better than where it started. |
A new abstraction, similar to
Aggregator
orFold
, that aims to reify the business logic assocaited withscanLeft
into its own abstraction with its own combinators.The testing here is pretty comprehensive, although I don't think
.zip
is covered. Moreover, I originally had a bunch of combinators named.zipWithXXX
that I later renamed to.joinWithXXX
to make more consistent with the types ofzip
andjoin
. I do think it's worth bikeshedding naming on all of this though.For posterity, my non-work github is @eigenvariable.